3 research outputs found

    Welfare Diplomacy: Benchmarking Language Model Cooperation

    Full text link
    The growing capabilities and increasingly widespread deployment of AI systems necessitate robust benchmarks for measuring their cooperative capabilities. Unfortunately, most multi-agent benchmarks are either zero-sum or purely cooperative, providing limited opportunities for such measurements. We introduce a general-sum variant of the zero-sum board game Diplomacy -- called Welfare Diplomacy -- in which players must balance investing in military conquest and domestic welfare. We argue that Welfare Diplomacy facilitates both a clearer assessment of and stronger training incentives for cooperative capabilities. Our contributions are: (1) proposing the Welfare Diplomacy rules and implementing them via an open-source Diplomacy engine; (2) constructing baseline agents using zero-shot prompted language models; and (3) conducting experiments where we find that baselines using state-of-the-art models attain high social welfare but are exploitable. Our work aims to promote societal safety by aiding researchers in developing and assessing multi-agent AI systems. Code to evaluate Welfare Diplomacy and reproduce our experiments is available at https://github.com/mukobi/welfare-diplomacy

    SuperHF: Supervised Iterative Learning from Human Feedback

    Full text link
    While large language models demonstrate remarkable capabilities, they often present challenges in terms of safety, alignment with human values, and stability during training. Here, we focus on two prevalent methods used to align these models, Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). SFT is simple and robust, powering a host of open-source models, while RLHF is a more sophisticated method used in top-tier models like ChatGPT but also suffers from instability and susceptibility to reward hacking. We propose a novel approach, Supervised Iterative Learning from Human Feedback (SuperHF), which seeks to leverage the strengths of both methods. Our hypothesis is two-fold: that the reward model used in RLHF is critical for efficient data use and model generalization and that the use of Proximal Policy Optimization (PPO) in RLHF may not be necessary and could contribute to instability issues. SuperHF replaces PPO with a simple supervised loss and a Kullback-Leibler (KL) divergence prior. It creates its own training data by repeatedly sampling a batch of model outputs and filtering them through the reward model in an online learning regime. We then break down the reward optimization problem into three components: robustly optimizing the training rewards themselves, preventing reward hacking-exploitation of the reward model that degrades model performance-as measured by a novel METEOR similarity metric, and maintaining good performance on downstream evaluations. Our experimental results show SuperHF exceeds PPO-based RLHF on the training objective, easily and favorably trades off high reward with low reward hacking, improves downstream calibration, and performs the same on our GPT-4 based qualitative evaluation scheme all the while being significantly simpler to implement, highlighting SuperHF's potential as a competitive language model alignment technique.Comment: Accepted to the Socially Responsible Language Modelling Research (SoLaR) workshop at NeurIPS 202

    Opportunities in Physics Education: Low-Cost Position Tracking for Use in Kinematics Labs

    Get PDF
    Traditional introductory physics kinematics laboratories utilized a few different instruments for locating objects in motion, all of which have shortcomings. Some provide only timing data, which heavily restricts trajectories and data collection. Some instruments provide more measurements but restrict object shapes, orientations, and textures. Still others require extensive pre-processing. None of these traditional instruments provide two- or three-dimensional position data. New, low-cost, local positioning technology, based on radio frequency wireless communications, is available that enables novel redesigns of physics laboratories. This technology provides two- and three-dimensional position measurements, continuously, at data rates of 10 Hz or faster, from any object to which it can be affixed. Our research group at Portland State University is exploring how this technology can be applied to reconstruct and improve introductory laboratories, making them easier to perform while increasing the amount of usable data gathered. Additionally, we seek to enhance model-based learning experience in labs by confronting students with more diverse models than traditionally encountered. For example, we are pursuing applications in free-fall experiments, aerodynamic friction, two-dimensional motion, two-dimensional collisions, tug-of-war competitions, as well as Astronomy applications such as retrograde motion
    corecore